home *** CD-ROM | disk | FTP | other *** search
- 19 February, 1984
-
- hdiff 1.14
-
- Purpose
- -------
-
- hdiff is a utility which can compare two standard DOS text files
- and isolate the differences between them. It can produce two
- distinct types of reports on the differences. First, hdiff can
- prepare a simple report of lines which appear in the second file
- but not in the first (insertions), and of lines which appear in
- the first file but not in the second (deletions). Second, hdiff
- can produce a special "report" which is, in fact, an Edlin
- script. This script, when applied to the first file, will
- produce a clone of the second file. This second function of
- hdiff is similar to the Unix utility "diff".
-
- hdiff uses a file comparison algorithm which was developed by
- Paul Heckel and described by D.E.Cortesi in Dr. Dobb's Journal
- #94 (August, 1984). The algorithm is substantially more
- efficient than traditional file comparison methods; you will find
- that it can generate a difference report between two files in
- little more than the time it takes to read the two files.
-
- This version of hdiff was derived from Cortesi's demonstration
- program, with substantial modifications which
-
- -- accomodate differences between Edlin and CP/M's Ed (for which
- the demo was written)
- -- allow use of Edlin's block move capabilities
- -- allow for much larger files through the use of all available
- memory.
- -- allow the use of command line parameters and switches,
- including case and spacing insensitivity.
- -- allow the user to specify at run time the maximum number of
- lines which will be processed. This allows hdiff to use memory
- more efficiently.
- -- allow the user to request the simpler difference report rather
- than the Edlin script.
-
-
- System requirements
- -------------------
-
- hdiff requires:
-
- -- IBM PC, PC/XT, PC/AT, or other MSDOS machine
- -- MSDOS 2.00 or later
- -- At least 128K of RAM. The more RAM you have, the larger the
- files you can process.
-
-
- Running hdiff
- -------------
-
- The general syntax for hdiff is:
-
- hdiff [-ecs] [-nnnn] oldfile.ext newfile.ext
-
- The optional -e switch instructs hdiff to produce an Edlin script
- file rather than the difference report.
-
- The optional -c switch instructs hdiff to ignore differences in
- case: "HDIFF" is the same as "hdiff".
-
- The optional -s switch instructs hdiff to ignore differences in
- spacing; all spaces and tabs are ignored for comparison purposes.
-
- The optional -nnnn switch assists in memory management; it
- represents the maximum number of lines hdiff will be required to
- process, i.e., the number of lines in the larger of the two
- files. The default for this value is 2000 lines; there is an
- absolute maximum of 5000 lines. See the section on memory
- management for more information about this switch.
-
- The switches may be combined, and they may be in any order: '-e
- -c -1000', '-1000ce', and '-ce1000' are all equivalent.
-
- Examples:
-
- hdiff foo.c newfoo.c
-
- compares file 'foo.c' with file 'newfoo.c' and produces a simple
- report showing insertions (lines in newfoo which do not appear in
- foo) and deletions (lines in foo which do not appear in newfoo).
- Lines which have been moved but are otherwise unchanged do not
- appear in this report.
-
- hdiff -ec foo.c newfoo.c
-
- compares foo.c with newfoo.c, ignoring case differences, and
- prepares an Edlin script. This script, if applied to foo, will
- create a copy of newfoo. The script file is sent to the console,
- so a more useful command is
-
- hdiff -e foo.c newfoo.c > foo.dat
-
- which uses standard DOS redirection to send the edlin script to
- the disk file foo.dat. Note that the program logo and error
- messages are unaffected by redirection and will always be sent to
- the screen.
-
- hdiff -e4000 foo.c newfoo.c > foo.dat
-
- is equivalent to the previous command, except that it informs
- hdiff that one of the files might contain up to 4000 lines.
-
-
- Report formats
- --------------
-
- The difference report consists of lines in the format:
-
- nnnn+ text
- or
- nnnn- text
-
- The '+' format indicates that the line is new (an insertion); the
- '-' format indicates that the line is gone (a deletion). Thus:
-
- 0001- This line appears in the old file only
- 0001+ This line appears in the new file only
-
- The 'nnnn' represents the line number. For '+' lines, it's the
- line number in the new file; for '-' lines, it's the line number
- in the old file.
-
- The Edlin script is a series of Edlin commands. See Edlin
- documentation for their meanings; the only commands which will
- appear are I (insert), D (delete), M (move), and E (End). The
- script may look a little strange if you look at it (with an
- editor or via the TYPE command). After the completion of each
- insertion sequence, there will be a heart symbol; this is the
- screen representation of Ctrl-C, which is used to terminate an
- Edlin insertion.
-
-
- Uses
- ----
-
- The simplest use for hdiff is to compare two files to see if they
- are the same. This can be used to check for corruption during
- backups, copies, etc., or to determine which of two files is
- newer. Even this simple use of hdiff can be useful in unexpected
- ways, however. For example, look at this small batch file:
-
- dir a: > temp
- find "-" temp > dir.a
- dir b: > temp
- find "-" temp > dir.b
- hdiff dir.b dir.a > temp.bat
- erase dir.a
- erase dir.b
- erase temp
-
- This batch can be used for a simple backup system. Assume that
- the default directory in drive A contains a series of files that
- you want to backup, and that the default directory in drive B
- contains the same set of files from the last backup. The batch
- will isolate differences between the two directories and prepare
- a file called temp.bat which contains a list of those files which
- have been changed or added since the last backup. (The .bat
- extension is used because many popular text editors could very
- easily convert the temp.bat file to a series of copy commands
- which could be used, in batch mode, to perform the copying.)
-
- The "Edlin" mode has potentially much more significant use.
- Perhaps its greatest potential lies in what are known as "source
- code control systems". These systems, quite common in mainframe
- and minicomputer systems, allow programmers to maintain many
- generations of program source text quite economically; rather
- than storing each modified file in its entirety, only the
- original is saved, along with a series of difference files.
-
- Hdiff provides a first step in this direction for MSDOS machines
- (see the "Plans" section below). Typical use of the current
- hdiff would be something like this. Assume that hdiff.scc
- contains an "original" version of hdiff; the current version
- (1.10) is hdiff.c. First, the command
-
- hdiff -e hdiff.scc hdiff.c > hdiff.110
-
- will create an edlin script which would convert hdiff.scc into
- version 1.10 of hdiff.c. Typically, the actual hdiff.c file
- would them be discarded (WARNING: see below. This program is
- experimental!) As newer versions are developed, the same
- procedure is used to create hdiff.111, hdiff.120, etc. Note that
- these difference files would, in all likelihood, be much smaller
- that the total size of all of the versions.
-
- In order to "retrieve" an earlier version, say 1.00, the command
-
- copy hdiff.scc hdiff.c
- edlin hdiff.c < hdiff.100
-
- would convert hdiff.scc into version 1.00 of hdiff.
-
- True source code control systems are considerably more efficient
- than this "by hand" method, are much easier to use, and provide
- significant features beyond mere storage of multiple versions.
-
- For whatever it's worth, note that
-
- hdiff -e file1 file2 | edlin file1
-
- is roughly equivalent to
-
- copy file2 file1
-
- except that the original file1 is saved in file1.bak.
-
-
- cdelta and cget
- ---------------
-
- The two demonstration batches, cdelta and cget, provide a quick
- sample of the kinds of things that can be done with hdiff and
- edlin. The two batches are designed for C programs; to revise
- them for other languages, simply replace all references to ".c"
- with the desired extension (.asm, for example).
-
- The purpose of cdelta is to generate a change script which will
- convert a "base" source file into a specified version of your
- source. Cget performs the inverse task; it applies a specified
- change file to the base and produces a file containing the
- specified version. File naming conventions are as follows:
-
- file.scc: "base" source; scc = source code control
- file.###: A change script to produce version ###
- file.c: The current version (cdelta), or the
- output file (cget)
-
- For example, suppose you are working with a C program called foo.
- A base (earliest) version of this file should be in foo.scc. You
- have just finished revision 1.10 of foo. To create the change
- file, type
-
- cdelta foo 110
-
- The batch will create a new file, foo.110; this file is an Edlin
- script which will convert foo.scc into version 1.10 of foo.c.
-
- To retrieve a specified version, say 1.05, use
-
- cget foo 105
-
- The batch will apply the script foo.105 to foo.scc and produce
- foo.c, which will contain the source for version 1.05.
-
- Note that cget always creates a file called file.c, overwriting
- any existing file by that name. This implies that you do NOT
- keep your current source in file.c; you keep the current source
- only by retaining file.scc and the delta files.
-
-
- Memory management
- -----------------
-
- Hdiff uses all available memory. The purpose of the -nnnn (max
- number of lines) switch is to allow it to use memory more
- efficiently, and to allow you to more effectively use hdiff in
- very small or very large machines. This is how it works.
-
- For each *potential* line, hdiff requires approximately 34 bytes
- of storage for various tables. The default configuration (space
- for 2000 lines) will thus require about 68K bytes of data space
- for the tables. The remainder of available memory (less the size
- of the program itself and a much smaller amount of overhead data)
- is used to store the text read from the files. Text storage
- space is required for each *unique* line in either file.
-
- If you have a small machine (i.e., less RAM), that much table
- space will leave very little room for text storage; it may even
- be more space than is available, and the program will not run at
- all. If you find this to be the case, try reducing the number of
- lines via the switch (-1000, or -500, for example.)
-
- Conversely, if you have a very large machine, you will have
- plenty of space available to process file larger than 2000 lines.
- If that is the case, increase the maxlines switch as necessary
- (but remember that in no case can maxlines exceed 5000).
-
- When hdiff is finished, it displays a message like:
-
- Storage use: 19%
-
- This message tells you approximately what percentage of the total
- available memory was actually used.
-
-
- Restrictions
- ------------
-
- The following act, in one way or another, as restrictions on
- hdiff:
-
- -- File format. Hdiff is intended as a DOS text file comparator
- only. It is NOT a replacement for the DOS utility 'comp'. Don't
- use it on binary (program or data) files, or on word processor
- files if they contain embedded control codes.
- -- Available memory (as discussed above)
- -- Actual size of the files. Edlin will read a file only until
- 75% of its available memory is filled. Since Edlin uses only a
- maximum of 64K, this means that it will read only 48K of text.
- Hdiff cannot account for this problem, so the absolute maximum
- file size it can handle is approximately 48K.
- -- Line size. Limited to a maximum of 255 characters/line.
-
-
- A Warning and A Plan
- --------------------
-
- Hdiff is experimental! It has been in use for about six months
- (as of 19 Feb 1985) with no known errors, but this is is NOT to
- say that you should entrust your only copy of a source file to
- hdiff! Please bear this in mind as you use it. Please report
- any problems to me.
-
- I intend, at some "unspecified future time", to incorporate hdiff
- or a version of it in a larger source code control system. This
- system would allow you to maintain multiple generations of
- program source files very efficiently (in terms of storage
- requirements). Some knotty problems relating to performance on a
- standard-issue PC remain to be solved. Comments or suggestions
- relating to this system are welcome. Tell me what you would like
- to see. In the meantime, a temporary "system" is avilable in the
- file "sccs.lbr", which contains simple versions of get and delta
- (written in C for performance reasons).
-
-
-
- ---------------------
-
- hdiff and this document are
- Copyright (c) 1984, 1985 by:
-
- Christopher J. Dunford
- 10057-2 Windstream Drive
- Columbia, Maryland 21044
- CompuServe 76703,2002
- Source STR211
-
- You may copy and use hdiff for your personal use only. You may
- copy hdiff for others, but you may not charge them for it. You
- may not use hdiff for any commercial purpose whatsoever. Address
- comments to the author at the above address, at CompuServe
- (preferably) or at the Source (occasionally).
-
- Hdiff is written in C and compiled using the Computer Innovations
- C86 compiler (Version 2.13), large model.